FFmpeg小白学习记录（二）视频流解码流程

您所在的位置：网站首页 › ffmpeg 压缩视频大小 › FFmpeg小白学习记录（二）视频流解码流程

FFmpeg小白学习记录（二）视频流解码流程

2023-03-24 22:40| 来源: 网络整理| 查看: 265

视频解码流程

在对多媒体文件中的视频流解码前，我们先来了解以下流媒体数据的播放流程，可以根据这个流程梳理一下视频解码流程

视频解码流程

音视频播放的原理主要分为：解协议 -> 解封装 -> 解码 -> 音视频同步 -> 播放，不过如果播放文件是本地文件就不需要解协议这一步骤

其中对应数据格式转换流程为：多媒体文件 -> 流 -> 包 -> 帧

蓝色元素块代表具体的数据

紫色元素块代表数据格式

橙色元素块代表数据所对应的协议

白色元素块代表执行的操作

在 FFmpeg 中获取多媒体文件中视频流的数据具体流程如下图：

获取视频流具体解码流程

接下来，我们根据上述的流程通过FFmpeg对视频流进行解码，具体代码如下：

extern"C" { #include "libavcodec/avcodec.h" #include "libavformat/avformat.h" #include "libswscale/swscale.h" #include "libavutil/imgutils.h" } #include using namespace std; int main() { int ret = 0; //文件地址 const char* filePath = "target.mp4"; //声明所需的变量名 AVFormatContext* fmtCtx = NULL; AVCodecContext* codecCtx = NULL; AVCodecParameters* avCodecPara = NULL; AVCodec* codec = NULL; //包 AVPacket* pkt = NULL; //帧 AVFrame* frame = NULL; do { //----------------- 创建AVFormatContext结构体 ------------------- //内部存放着描述媒体文件或媒体流的构成和基本信息 fmtCtx = avformat_alloc_context(); //----------------- 打开本地文件 ------------------- ret = avformat_open_input(&fmtCtx, filePath, NULL, NULL); if (ret) { printf("cannot open file\n"); break; } //----------------- 获取多媒体文件信息 ------------------- ret = avformat_find_stream_info(fmtCtx, NULL); if (ret < 0) { printf("Cannot find stream information\n"); break; } //通过循环查找多媒体文件中包含的流信息，直到找到视频类型的流，并记录该索引值 int videoIndex = -1; for (int i = 0; i < fmtCtx->nb_streams; i++) { if (fmtCtx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) { videoIndex = i; break; } } //如果videoIndex为-1 说明没有找到视频流 if (videoIndex == -1) { printf("cannot find video stream\n"); break; } //打印流信息 av_dump_format(fmtCtx, 0, filePath, 0); //----------------- 查找解码器 ------------------- avCodecPara = fmtCtx->streams[videoIndex]->codecpar; AVCodec* codec = avcodec_find_decoder(avCodecPara->codec_id); if (codec == NULL) { printf("cannot open decoder\n"); break; } //根据解码器参数来创建解码器上下文 codecCtx = avcodec_alloc_context3(codec); ret = avcodec_parameters_to_context(codecCtx, avCodecPara); if (ret < 0) { printf("parameters to context fail\n"); break; } //----------------- 打开解码器 ------------------- ret = avcodec_open2(codecCtx, codec, NULL); if (ret < 0) { printf("cannot open decoder\n"); break; } //----------------- 创建AVPacket和AVFrame结构体 ------------------- pkt = av_packet_alloc(); frame = av_frame_alloc(); //----------------- 读取视频帧 ------------------- int i = 0; //记录视频帧数 while (av_read_frame(fmtCtx, pkt) >= 0) {//读取的是一帧视频数据存入AVPacket结构体中 //是否对应视频流的帧 if (pkt->stream_index == videoIndex) { //发送包数据去进行解析获得帧数据 ret = avcodec_send_packet(codecCtx, pkt); if (ret == 0) { //接收的帧不一定只有一个，可能为0个或多个 //比如：h264中存在B帧，会参考前帧和后帧数据得出图像数据 //即读到B帧时不会产出对应数据，直到后一个有效帧读取时才会有数据，此时就有2帧 while (avcodec_receive_frame(codecCtx, frame) == 0) { //此处就可以获取到视频帧中的图像数据 -> frame.data //可以通过openCV、openGL、SDL方式进行显示 //也可以保存到文件中（需要添加文件头） i++; } } } av_packet_unref(pkt);//重置pkt的内容 } //此时缓存区中还存在数据，需要发送空包刷新 ret = avcodec_send_packet(codecCtx, NULL); if (ret == 0) { while (avcodec_receive_frame(codecCtx, frame) == 0) { i++; } } printf("There are %d frames int total.\n", i); } while (0); //----------------- 释放所有指针 ------------------- avcodec_close(codecCtx); avformat_close_input(&fmtCtx); av_packet_free(&pkt); av_frame_free(&frame); return 0; }

输出结果：

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'target.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.48.100 Duration: 00:03:10.36, start: 0.000000, bitrate: 773 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 1280x720, 442 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 325 kb/s (default) Metadata: handler_name : SoundHandler There are 4759 frames int total.

4759 ÷ 25 + 0.0 = 190.36 ⇒ 03 :10. 36

在 FFmpeg 中将 JPG、PNG图片文件视为只有一帧的视频流，也可以使用上述解码流程读取图像数据

代码解析结构体

上述代码中涉及的结构体有：AVFormatContext、AVCodecParameters、AVCodecContext、AVCodec、AVPacket、AVFrame

其中AVFormatContext已经讲解过，此处就不再阐述

AVCodecParameters和AVCodecContext

新的 FFmpeg 中 AVStream.codecpar(struct AVCodecParameter) 代替 AVStream.codec(struct AVCodecContext)：AVCodecParameter 是由 AVCodecContext 分离出来的，AVCodecParameter中没有函数，里面存放着解码器所需的各种参数

AVCodecContext 结构体仍然是编解码时不可或缺的结构体

// 其中截取出部分较为重要的数据 typedef struct AVCodecParameters { enum AVMediaType codec_type; //编解码器的类型（视频，音频...） enum AVCodecID codec_id; //标示特定的编码器 int bit_rate; //平均比特率 int sample_rate; //采样率（音频） int channels; //声道数（音频） uint64_t channel_layout; //声道格式 int width, height; //宽和高（视频） int format; //像素格式（视频）/采样格式（音频） ... } AVCodecParameters; typedef struct AVCodecContext { //在AVCodecParameters中的属性，AVCodecContext都有 struct AVCodec *codec; //采用的解码器AVCodec（H.264,MPEG2...） enum AVSampleFormat sample_fmt; //采样格式（音频） enum AVPixelFormat pix_fmt; //像素格式（视频） ... }AVCodecContext;

其中avcodec_parameters_to_context就是将AVCodecParameter的参数传给AVCodecContext

AVCodec

AVCodec解码器结构体，对应一个具体的解码器

// 其中截取出部分较为重要的数据 typedef struct AVCodec { const char *name; //编解码器短名字（形如："h264"） const char *long_name; //编解码器全称（形如："H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10"） enum AVMediaType type; //媒体类型：视频、音频或字母 enum AVCodecID id; //标示特定的编码器 const AVRational *supported_framerates; //支持的帧率（仅视频） const enum AVPixelFormat *pix_fmts; //支持的像素格式（仅视频） const int *supported_samplerates; //支持的采样率（仅音频） const enum AVSampleFormat *sample_fmts; //支持的采样格式（仅音频） const uint64_t *channel_layouts; //支持的声道数（仅音频） ... }AVCodec ;

AVPacket

AVPacket：存储解码前数据的结构体，即包

// 其中截取出部分较为重要的数据 typedef struct AVPacket { AVBufferRef *buf; //管理data指向的数据 uint8_t *data; //压缩编码的数据 int size; //data的大小 int64_t pts; //显示时间戳 int64_t dts; //解码时间戳 int stream_index; //标识该AVPacket所属的视频/音频流 ... }AVPacket ;

AVPacket的内存管理

AVPacket本身并不包含压缩的数据，通过data指针引用数据的缓存空间

可以多个AVPacket共享同一个数据缓存（AVBufferRef、AVBuffer）

av_read_frame(pFormatCtx, packet); // 读取Packet av_packet_ref(dst_pkt,packet); // dst_pkt 和 packet 共享同一个数据缓存空间，引用计数+1 av_packet_unref(dst_pkt); // 释放 pkt_pkt 引用的数据缓存空间，引用计数-1

AVFrame

AVFrame：存储解码后数据的结构体，即帧

// 其中截取出部分较为重要的数据 typedef struct AVFrame { uint8_t *data[AV_NUM_DATA_POINTERS]; //解码后原始数据（对视频来说是YUV，RGB，对音频来说是PCM） int linesize[AV_NUM_DATA_POINTERS]; //data中“一行”数据的大小。注意：未必等于图像的宽，一般大于图像的宽。 int width, height; //视频帧宽和高（1920x1080,1280x720...） int format; //解码后原始数据类型（YUV420，YUV422，RGB24...） int key_frame; //是否是关键帧 enum AVPictureType pict_type; //帧类型（I,B,P...） AVRational sample_aspect_ratio; //图像宽高比（16:9，4:3...） int64_t pts; //显示时间戳 int coded_picture_number; //编码帧序号 int display_picture_number; //显示帧序号 int nb_samples; //音频采样数 ... }AVFrame ; 函数

avcodec_find_decoder

avcodec_find_decoder根据解码器ID查找到对应的解码器

AVCodec *avcodec_find_decoder(enum AVCodecID id); //通过id查找解码器 AVCodec *avcodec_find_decoder_by_name(const char *name); //通过解码器名字查找 /* 与解码器对应的就是编码器，也有相应的查找函数 */ AVCodec *avcodec_find_encoder(enum AVCodecID id); //通过id查找编码器 AVCodec *avcodec_find_encoder_by_name(const char *name); //通过编码器名字查找

参数：

enum AVCodecID id：解码器ID，可以从AVCodecParameters中获取

return：

返回一个AVCodec指针，如果没有找到就返回NULL

avcodec_alloc_context3

avcodec_alloc_context3会生成一个AVCodecContext并根据解码器给属性设置默认值

AVCodecContext *avcodec_alloc_context3(const AVCodec *codec);

参数：

const AVCodec *codec：解码器指针，会根据解码器分配私有数据并初始化默认值

return:

返回一个AVCodec指针，如果创建失败则会返回NULL

话说avcodec_alloc_context3函数名中的 3 是什么含义？

avcodec_parameters_to_context

avcodec_parameters_to_context将AVCodecParameters中的属性赋值给AVCodecContext

int avcodec_parameters_to_context(AVCodecContext *codec, const AVCodecParameters *par){ //将par中的属性赋值给codec codec->codec_type = par->codec_type; codec->codec_id = par->codec_id; codec->codec_tag = par->codec_tag; ... }

参数：

AVCodecContext *codec：需要被赋值的AVCodecContext const AVCodecParameters *par：提供属性值的AVCodecParameters

return:

返回数值 ≥ 0时代表成功，失败时会返回一个负值

avcodec_open2

avcodec_open2打开音频解码器或者视频解码器

int avcodec_open2(AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options){ ... avctx->codec = codec; ... }

参数：

AVCodecContext *avctx：已经初始化完毕的AVCodecContext const AVCodec *codec：用于打开AVCodecContext中的解码器，之后AVCodecContext会使用该解码器进行解码 AVDictionary **options：指定各种参数，基本填NULL即可

return:

返回0表示成功，若失败则会返回一个负数

av_read_frame

av_read_frame获取音视频（编码）数据，即从流中获取一个AVPacket数据。将文件中存储的内容分割成包，并为每个调用返回一个包

int av_read_frame(AVFormatContext *s, AVPacket *pkt);

参数：

AVFormatContext *s：AVFormatContext结构体 AVPacket *pkt：通过data指针引用数据的缓存空间，本身不存储数据

return:

返回0表示成功，失败或读到了文件结尾则会返回一个负数

函数为什么是av_read_frame而不是av_read_packet，是早期 FFmpeg 设计时候没有包的概念，而是编码前的帧和编码后的帧，不容易区分。之后才产生包的概念，但出于编程习惯或向前兼容的原因，于是方法名就这样延续了下来

avcodec_send_packet

avcodec_send_packet用于向解码器发送一个包，让解码器进行解析

int avcodec_send_packet(AVCodecContext *avctx, const AVPacket *avpkt);

参数：

AVCodecContext *avctx：AVCodecContext结构体，必须使用avcodec_open2打开解码器 const AVPacket *avpkt：用于解析的数据包

return:

返回0表示成功，失败则返回负数的错误码，异常值说明：

AVERROR(EAGAIN)：当前不接受输出，必须重新发送 AVERROR_EOF：解码器已经刷新，并且没有新的包可以发送 AVERROR(EINVAL)：解码器没有打开，或者这是一个编码器 AVERRO(ENOMEN)：无法添加包到内部队列

avcodec_receive_frame

avcodec_receive_frame获取解码后的音视频数据（音视频原始数据，如YUV和PCM）

int avcodec_receive_frame(AVCodecContext *avctx, AVFrame *frame);

参数：

AVCodecContext *avctx：AVCodecContext结构体 AVFrame *frame：用于接收解码后的音视频数据的帧

return:

返回0表示成功，其余情况表示失败，异常值说明：

AVERROR(EAGAIN)：此状态下输出不可用，需要发送新的输入才能解析 AVERROR_EOF：解码器已经刷新，并且没有新的包可以发送 AVERROR(EINVAL)：解码器没有打开，或者这是一个编码器

调用avcodec_receive_frame方法时不需要通过av_packet_unref解引用，因为在该方法内部已经调用过av_packet_unref方法解引用

严格来说，除AVERROR(EAGAIN)和AVERROR_EOF两种错误情况之外的报错，应该直接退出程序

释放资源函数

avcodec_close(codecCtx); avformat_close_input(&fmtCtx); av_packet_free(&pkt); av_frame_free(&frame);

只有我们手动申请的资源才需要我们手动进行释放，其余的资源FFmpeg会自动释放，重复调用会报错

如：AVCodecParameters是AVFormatContext内部的资源，就不需要我们手动释放，在avformat_close_input函数中会对其进行释放，不需要我们通过avcodec_parameters_free释放

使用OpenCV显示视频图像

因为最近还在学OpenCV处理图像，而且相比于SDL、OpenGL或ANativeWindow流程更加简洁，所以这里就通过OpenCV显示视频图像

libswscale库用于视频场景比例缩放、色彩映射转换；图像颜色空间或格式转换，而SwsContext结构体贯穿整个变换流程，其中存放变换所需的参数

// 其中截取出部分较为重要的数据 typedef struct SwsContext { int srcW; //源图像中亮度/alpha的宽度 int srcH; //源图像中亮度/alpha的高度 int dstH; //目标图像中的亮度/alpha的宽度 int dstW; //目标图像中的亮度/alpha的高度 int chrSrcW; //源图像中色度的宽度 int chrSrcH; //源图像中色度的高度 int chrDstW; //目标图像中色度的宽度 int chrDstH; //目标图像中色度的高度 enum AVPixelFormat dstFormat; //目标图像的格式，如：YUV420P、YUV444、RGB、RGBA、GRAY等 enum AVPixelFormat srcFormat; //源图像的格式 int needAlpha; //是否存在透明度 int flags; //选择、优化、子采样算法的flag标识 ... } SwsContext; 函数

sws_getContext

sws_getContext用来创建并返回SwsContext结构体

struct SwsContext *sws_getContext(int srcW, int srcH, enum AVPixelFormat srcFormat, int dstW, int dstH, enum AVPixelFormat dstFormat, int flags, SwsFilter *srcFilter, SwsFilter *dstFilter, const double *param);

参数：

int srcW：源图像的宽 int srcH：源图像的高 enum AVPixelFormat srcFormat：源图像的格式，如：YUV420P、YUV444、RGB、RGBA、GRAY等 int dstW：目标图像的宽 int dstH：目标图像的高 enum AVPixelFormat dstFormat：目标图像的格式 int flags：指定算法进行缩放插值 SwsFilter *srcFilter、SwsFilter *dstFilter：与Chroma/luminsence滤波相关，一般填NULL即可 const double *param：用于scalar的额外的数据，一般填NULL即可

return:

返回一个指向SwsContext的指针，或者出现错误的时候返回NULL

int flags：指定算法类型

#define SWS_FAST_BILINEAR 1 //选择快速双线性缩放算法 #define SWS_BILINEAR 2 //选择双线性缩放算法 #define SWS_BICUBIC 4 //选择双三次缩放算法 #define SWS_X 8 #define SWS_POINT 0x10 #define SWS_AREA 0x20 #define SWS_BICUBLIN 0x40 #define SWS_GAUSS 0x80 #define SWS_SINC 0x100 #define SWS_LANCZOS 0x200 #define SWS_SPLINE 0x400

其中具体根据需求选择合适的算法，可以看一下如何选择swscale中的缩放算法

其实还有一个获取SwsContext结构体的函数——sws_getCachedContext，这个函数会根据参数去检验参数中的SwsContext是否符合之后输入的参数，符合就直接返回该结构体指针进行复用，若不符合则会进行释放，然后根据参数创建一个新的SwsContext结构体并返回其指针

struct SwsContext *sws_getCachedContext(struct SwsContext *context, int srcW, int srcH, enum AVPixelFormat srcFormat, int dstW, int dstH, enum AVPixelFormat dstFormat, int flags, SwsFilter *srcFilter, SwsFilter *dstFilter, const double *param);

av_image_fill_arrays

该函数会根据图像类型参数、数组参数和宽高设置数据指针和线宽值

int av_image_fill_arrays(uint8_t *dst_data[4], int dst_linesize[4], const uint8_t *src, enum AVPixelFormat pix_fmt, int width, int height, int align);

参数：

uint8_t *dst_data[4]：要进行填充的数据指针 int dst_linesize[4]：填充的图像的线宽值 const uint8_t *src：包含或之后会包含实际的图像数据 enum AVPixelFormat pix_fmt：图像格式 int width：图像的宽 int height：图像的高 int align：是否根据线宽对src进行对齐调整

return:

成功返回src所需的字节大小，失败会返回一个负数的错误码

sws_scale

sws_scale函数会根据SwsContext中设置的参数，将源图像转换为对应属性的目标图像，其中srcSlice必须是以图像中连续的行序列为顺序的二维数组

int sws_scale(struct SwsContext *c, const uint8_t *const srcSlice[], const int srcStride[], int srcSliceY, int srcSliceH, uint8_t *const dst[], const int dstStride[]);

参数：

struct SwsContext *c：之前创建的SwsContext结构体 const uint8_t *const srcSlice[]：包含指向源数据平面的指针的数组，如：yuv420p中 y、u、v 数据分别存放，可以视为3个uint8_t数组 const int srcStride[]：对应每个源数据平面的长度 int srcSliceY：开始处理的y坐标位置 int srcSliceH：源数据平面的高度，即对应数组的长度 uint8_t *const dst[]：目标图像数据指针 const int dstStride[]：目标图像每个源数据平面的长度

return:

输出目标图像源数据平面的高度

RGB类型的图像R、G、B值混合存放所以无法将RGB抽离成对应的三个连续的数组，其数据全部存放在 data[0] 中，所以可以直接使用img.data = rgbFrame->data[0]给Mat赋值

而对于YUV格式的图像数据，按照img.data = rgbFrame->data[0]的方式给Mat赋值就会报错

资料参考

微信公众号：八小时程序员

FFmpeg4入门05：解码视频流过程

【本文地址】

FFmpeg小白学习记录（二）视频流解码流程

FFmpeg小白学习记录（二）视频流解码流程

今日新闻

推荐新闻